-
Notifications
You must be signed in to change notification settings - Fork 836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
55 outlier detection #105
55 outlier detection #105
Conversation
…dvanced graphs notebook
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we link from top level readme in a new section "components" ? to the notebook describing the outlier detector.
@@ -25,7 +25,7 @@ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": { | |||
"collapsed": true | |||
"collapsed": false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be a missing proto compile/copy as used in other notebooks. One gets the error:
ImportError Traceback (most recent call last)
in ()
1 import requests
2 from requests.auth import HTTPBasicAuth
----> 3 from proto import prediction_pb2
4 from proto import prediction_pb2_grpc
5 import grpc
ImportError: cannot import name prediction_pb2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you build the protos locally first? Using the makefile in notebooks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always add to the notebooks:
!cp ../proto/prediction.proto ./proto
!python -m grpc.tools.protoc -I. --python_out=. --grpc_python_out=. ./proto/prediction.proto
so they are self contained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
@@ -569,7 +569,7 @@ | |||
"* Two models\n", | |||
"\n", | |||
"The outlier detector is a special kind of transformer that will populate a tag in the response metadata with the outlier score it has calculated. \n", | |||
"We use the docker image seldonio/mock_outlier_detector:1.0 for the outlier detector.\n", | |||
"We use the docker image seldonio/outlier_mahalanobis:0.2 for the outlier detector.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth adding some explanation of the returned values from this test. Are are the outlier scores meat to be useful here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No since the features sent are meaningless they can't really be interpreted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, if you always send the same 2 points, as is the case with the rest_request, you will always see an outlier score of 0
"source": [ | ||
"The output of the algorithm (outlier score) is a measure of distance from the center of the features distribution (Mahalanobis distance). The algorithm is online, which means that it starts without knowledge about the distribution of the features and learns as requests arrive. Consequently you should expect the output to be bad at the start and to improve over time. \n", | ||
"\n", | ||
"The output being a real positive number, we leave it to the user to decide on a threshold for when a point will be consider to be an outlier.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo - considered
"As observations arrive, the algorithm will:\n", | ||
"- Keep track and update the mean and sample covariance matrix of the dataset\n", | ||
"- Apply a principal component analysis using these moments and project the new observations on the first 3 principal components (default value, can be changed)\n", | ||
"- Compute the Mahalanobis distance from this projections to the projected mean\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo - projection
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"To compute the outlier score of each point in the new batch, we need the inverse of the covariance matrix of all the points up to this one. This means inverting $b$ matrices. We made this operation faster by leveraging the fast that each covariance matrix is a rank one update of the previous one. \n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo - "the fast"
@@ -1,5 +1,5 @@ | |||
IMAGE_NAME=docker.io/seldonio/core-python-wrapper | |||
IMAGE_VERSION=0.7 | |||
IMAGE_VERSION=0.8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we updated all docs to version 0.8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet. These changes only impact someone who wants to build and wrap an outlier detector, but this isn't document anywhere at the moment...
No description provided.